Automatic free-text-tagging of online news archives
نویسندگان
چکیده
In this paper, we shall introduce the problem of free-texttagging of online news archives. From an application point of view, it has many benefits for online news portals and on the other hand, the task has unique characteristics compared to existing approaches for free-text-tagging. We shall describe our system, which was developed for the archive (consisting of 370 thousand articles) of the most visited Hungarian news portal www.origo.hu, along with research questions encountered and solved during our task. As the evaluation of tagging is not straightforward at the end of the project the news company manually investigated the tagging of the automatic system and found that its accuracy was 71.9%.
منابع مشابه
بهبود خلاصه سازی خودکار متون فارسی با استفاده از روشهای پردازش زبان طبیعی و گراف شباهت
A significant amount of available information is stored in textual databases which contains a large collection of documents from different sources (such as news, articles, books, emails and web pages). The increasing visibility and importance of this class of information motivates us to work on having better automatic evaluation tools for textual resources. The automatic summarization of tex...
متن کاملسیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی
Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...
متن کاملSemi-automatic Ontology Extension Using Text Mining
This paper addresses the process of the ontology extension for a selected domain of interest which is defined by keywords and possibly a glossary of relevant terms. A new methodology for semi-automatic ontology extension, aggregating the elements of text mining and user-dialog approaches for ontology extension, is proposed and evaluated. We conduct a set of ranking, tagging and illustrative que...
متن کاملA Chinese Automatic Text Summarization system for mobile devices
A large amount of on-line information and lengthiness information can’t fit for the mobile devices. In order to save this problem, we propose a method which collects original news text from on-line information and extracts summary sentences from them automatically. On this basis, we adopt WML(Wireless Markup Language) to build a news website for mobile devices browsing through the news summary....
متن کاملA system for the retrieval of Italian broadcast news
This paper presents a prototype for the retrieval of Italian broadcast news, which has been developed at ITC-irst. The architecture employs a speech recognition engine for the automatic transcription of audio news. Moreover, it features document indexing based on part-of-speech tagging of text coupled with morphological analysis, and query expansion exploiting the Italian WordNet thesaurus. Que...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010